A Generic Approach to Bulk Loading Multidimensional Index Structures
نویسندگان
چکیده
Recently there has been an increasing interest in supporting bulk operations on multidimensional index structures. Bulk loading refers to the process of creating an initial index structure for a presumably very large data set. In this paper, we present a generic algorithm for bulk loading which is applicable to a broad class of index structures. Our approach differs completely from previous ones for the following reasons. First, sorting multidimensional data according to a predefined global ordering is completely avoided. Instead, our approach is based on the standard routines for splitting and merging pages which are already fully implemented in the corresponding index structure. Second, in contrast to inserting records one by one, our approach is based on the idea of inserting multiple records simultaneously. As an example we demonstrate in this paper how to apply our technique to the R-tree family. For R-trees we show that the I/O performance of our generic algorithm meets the lower bound of external sorting. Empirical results demonstrate that performance improvements are also achieved in practice without sacrificing query performance.
منابع مشابه
An Evaluation of Generic Bulk Loading Techniques
Bulk loading refers to the process of creating an index from scratch for a given data set. This problem is well understood for B-trees, but so far, non-traditional index structures received modest attention. We are particularly interested in fast generic bulk loading techniques whose implementations only employ a small interface that is satisfied by a broad class of index structures. Generic te...
متن کاملThe Bulk Index Join: A Generic Approach to Processing Non-Equijoins
Efficient join algorithms have been developed for processing different types of non-equijoins like spatial join, band join, temporal join or similarity join. Each of these previously proposed join algorithms is tailor-cut for a specific type of join, and a generalization of these algorithms to other join types is not obvious. We present an efficient algorithm called bulk index join that can be ...
متن کاملSpace-Partitioning-Based Bulk-Loading for the NSP-Tree in Non-ordered Discrete Data Spaces
Properly-designed bulk-loading techniques are more efficient than the conventional tuple-loading method in constructing a multidimensional index tree for a large data set. Although a number of bulkloading algorithms have been proposed in the literature, most of them were designed for continuous data spaces (CDS) and cannot be directly applied to non-ordered discrete data spaces (NDDS). In this ...
متن کاملBulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces
Applications demanding multidimensional index structures for performing efficient similarity queries often involve a large amount of data. The conventional tuple-loading approach to building such an index structure for a large data set is inefficient. To overcome the problem, a number of algorithms to bulk-load the index structures, like the Rtree, from scratch for large data sets in continuous...
متن کاملEfficient Bulk Loading of Large High-Dimensional Indexes
Efficient index construction in multidimensional data spaces is important for many knowledge discovery algorithms, because construction times typically must be amortized by performance gains in query processing. In this paper, we propose a generic bulk loading method which allows the application of user-defined split strategies in the index construction. This approach allows the adaptation of t...
متن کامل